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Abstract 


An efficient, direct, second order solver for the discrete solution of a class of two- 
dimensional separable elliptic equations on the sphere is presented. The method involves a Fourier 
transfoimation in longitude and a direct solution of the resulting coupled second order finite- 
difference equations in latitude. The solver is made efficient by vectorizing over longitudinal 
wavenumber and by using a vectorized fast Fourier transform routine. It is evaluated using a 
prescribed solution method and compared with a multigrid solver and the standard direct solver 

from FISHPAK. 
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I. Introduction 


Numerical techniques used in global atmospheric models have evolved over the past several 
decades. Models with explicit time differencing generally require very small time steps in order to 
avoid linear computational instability associated with fast moving gravity waves, particularly 
because of the convergence of meridians near the pole. The introduction of semi-implicit time 
differencing (Robert, 1969) relaxed the requirement for linear computational stability and allowed 
larger time steps relative to explicit schemes. Further advances occurred through the introduction 
of semi-Lagrangian semi-implicit time differencing schemes (see Staniforth and Cote, 1991 and 
Bates et al., 1992 for a comprehensive review of the evolution of the semi-Lagrangian approach). 

The implicit (or semi-implicit) time differencing, in general, leads to an elliptic equation (two- 
or three-dimensional and separable or non-separable depending on the formulation) on the sphere 
(Tempjerton and Staniforth, 1987; McDonald and Bates, 1989; Tanguay et al., 1989; Bates et al., 
1990; Barros et al., 1989). Thus for the implicit schemes to be more economical compared to their 
explicit counterparts, efficient solvers are of paramount importance. 

Algorithms for the direct solution of separable elliptic equations have been around for awhile. 
Swarztrauber (1974a) developjed a method of direct solution of separable elliptic equations by 
extending the stabilized cyclic reduction algorithm. When the coefficients of the elliptic equations 
are independent of one of the dimensions (which is the class of equations we are considering here), 
the problem can be solved with the application of Fourier analysis. Hockney (1965) used this 
approach to obtain the direct solution of Poisson's equation. Lindzen and Kuo (1969) suggested 
the use of a Fourier transform in one direction combined with the direct solution of the resulting 
second order ordinary differential equations in the other, for more general elliptic equations. Le 
Bail (1972) applied fast Fourier transforms (FFT's) to solve a class of partial differential 
equations. 
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Until recently, fast direct solvers on the sphere have been available for the discrete Poisson 
or Helmholtz type elliptic equations (Swarztrauber and Sweet, 1973, 1975; Sweet, 1973, 1974; 
Swarztrauber, 1974; Adams et al., 1980). Thus many implicit time differencing schemes have 
been geared towards obtaining such elliptic equations (e.g. McDonald and Bates, 1989). Bates et 
al., (1990) (hereafter BSHB) were the first to obtain a more general elliptic equation for their 
vector semi-Lagrangian semi-implicit time differencing scheme in the global shallow water 
framework. Because no efficient direct solver was available at that time, they used the multigrid 
solver developed by Banos (1991); multigrid methods can be applied to more general elliptic 
equations and more complex domains (Phillips, 1984; Fulton et al.,1986; Banos et al., 1989, 

Bates et al., 1990; Banos, 1991). However, when the scheme of BSHB was extended to a global 
multi-level primitive equation model (Bates et al., 1992), it was soon realized that the original 
multigrid solver was not efficient enough. This led to the development of the FFT based direct 
solver presented here. 

It should be emphasized that the idea of using FFTs is not new; they are routinely used to 
solve Poisson and Helmholtz equations on the sphere. Recently, Cote and Staniforth (1990) have 
also described and applied this approach to a more general elliptic problem. 

Our solver is based on a latitude/longitude grid over the sphere. The solution method 
involves a Fourier decomposition in longitude to separate that dependency. This reduces the 
problem to a set of coupled ordinary differential equations in the latitudinal direction for each 
longitudinal wavenumber. These coupled equations are solved using the procedure described in 
Lindzen and Kuo (1969) and Chao (1979). 

In Section II we present the elliptic equation, the separation of longitudinal dependency, the 
latitudinal discretization, the method of solution for two coupled ordinary differential equations and 
a brief description of the coding strategy and the calling sequence for the solver. In Section HI, we 
validate the algorithm and compare it with the multigrid method used by BSHB. A summary is 
given in Section IV. The Appendix contains a listing of the source code for the solver. 
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II. Description of the solver 

a) The elliptic equation 

In this report we present a fast and direct algorithm to solve the following class of two- 
dimensional elliptic equations on the sphere: 
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where a is the radius of the sphere, X and 0 are the longitude and latitude, respectively, <|> is the 
solution, and F is the forcing (which is known). Here the coefficients c,(0), c 2 (0), c 3 (0), c 4 (0), 
c 5 (0), and, c 6 (0) are at most functions of latitude. If any of these coefficients is also a function 
of longitude, then it is nearly impossible to write an efficient direct solver for such an equation in 
which case the multigrid method would be preferable (Phillips, 1984). 

b) Separation of longitudinal dependency 

We solve (1) on a uniform longitude and latitude grid on the sphere. Any field defined on 
this grid can be expanded into a finite Fourier series in the longitudinal direction X. Supposing that 
we have I equally spaced grid points along a latitude circle, any function <J> can be expanded in a 
finite Fourier series as 

1/2 A 

<KX n ) =£♦(*) (2) 

- 1/2 
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A ■ A 

Here <{>(&) is the complex amplitude for wavenumber k and <J>(—£) is the complex conjugate of 
<j)(fc) where i = 4-1. The complex amplitude can be obtained by 

1 1 

Since we are considering real data, for k =0 (i.e. the longitudinal mean part) and k =1/2, only the 
real part of the amplitude functions exist. 

We first consider the case when 0<k <1/2. For any wavenumber k , eq. (1) reduces to a 

second order ordinary differential equation in 0 for the complex amplitude. Without loss of 

generality, let us consider a solution to (1) of the form 

<j> = ie*\andF = Fe i * ( 4 ) 

where 4> = <t> r +iV, F = F r + /F\ After substituting (4) into (1) we obtain 
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Se paratin g (5) into real and imaginary parts, we obtain the following two coupled second order 
ordinary differential equations: 
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Equations (6) and (7) determine the latitudinal structures of the amplitude for loQ. When k = 0 (ix. 
the longitudinal mean part) only the real part exists and we obtain a single second order ordinary 
differential equation. Denoting the longitudinal mean part by an overbar, we have 


?^ <c ’ (9)cose H )+ i^i 


(c 5 (0)cos0 <|>) + c 6 (0)<j> = F. 


( 8 ) 


Finally, since the imaginary part is zero for k =1/2, the problem reduces to a single second 
order ordinary differential equation governing the real part This is obtained by ignoring the terms 
with imaginary part in (6). 


c) Latitudinal discretization 

We represent the distance from the south pole to the north pole of the sphere by a set of J 
equally spaced grid points. Each grid point is referred to by an integer index j, with j=l and j=J 
representing the south and north poles, respectively. For interior points of the domain, we 
approximate the latitudinal derivatives in (6) and (7) by second order finite differences. Since the 
elliptic operator is not formally defined at the poles, we use integral definitions there, as detailed 
later. Thus for the interior, we assume 
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where l<j<J. As shown by Barros (1991), the same discretization can be obtained through the 
finite volume discretization approach. 

The centered differencing cannot be applied at the polar singularities. Therefore, at the poles 
we follow the integral approach used by Barros (1991) and BSHB. Since the poles are singular 
points, only the longitudinally symmetric component (k =0) is non zero there. Thus for the 
longitudinally asymmetric components ( k >0), the boundary conditions at the poles simply 

become, 

<t>J=0 and <{>‘=0 

To obtain the boundary conditions for the longitudinally symmetric part (which has only real part), 
we integrate (1) over the polar caps lying within A0/2 from the poles. Then the terms with X, 
derivatives vanish due to cyclic continuity in the longitudinal direction. At the north pole, we then 

obtain, 

[f 2 _L-i-( c ,(e)cos0^) + -^^(c 5 (0)cos0<|))+a 2 c 6 (0)<l)] cos0d0dX 

'(*/2- a8/2) qos 0 30 30 cos 0 30 



6 


F a 2 cos0d0dX. 
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From (12) we obtain 
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Now the derivative of <{> in the first term is evaluated by a centered difference and <J> at J-l/2 is 
approximated by the average of its values at J and J-l. The second term in (13) is approximated by 

the mid-point rule applied to the spherical cap. Dividing by the approximation to the area of the 
spherical cap, a 2 cos0j_^ A0 (tc / 2), and denoting the longitudinal average by an overbar, we obtain 
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for the south pole. The above discretizations are second order accurate and maintain the 
conservative properties of the derivatives in the continuous form. 


(15) 


d) Solution of two simultaneous second order finite-difference equations 


In the previous section, we reduced the problem to solving (9) and (10) subject to (1 1) (for 
l</t<I/2-l), to solving the discrete version of (8) subject to (14) and (15) (for k= 0) and to solving 
(9) without the imaginary part subject to (11) (for k =1/2). To solve (9) and (10) subject to (1 1), 
we follow the approach of Lindzen and Kuo (1969) and Chao (1979). For the sake of 
completeness, we repeat the method here as it is applied to our problem. Details of the method of 
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solution for k =0 and k =1/2 will not be presented here, however, the solutions can be obtained by 
following the procedure below (or see Lindzen and Kuo, 1969). 

For simplicity, we rewrite (9) and (10) as 



a;*;., + b;*; + c;*;, + a>‘ h + b$ + q*'., = f; 

(16) 

and 
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(17) 

for j=2, 3, . 

... J-l. The coefficients in (16) and (17) are defined from (9) and (10). 

We seek the 

solutions to (16) and (17) in the form 
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where the a’s. P's and Ys are new variables yet to be determined. Evaluating (18) and (19) at j-1 

and substituting into (16) and (17), we obtain 
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Eliminating <{>’ from (20) and (21) and comparing the result with (18), we obtain 


«j = -( c jPj “ R j a )) / (ajp) - P>‘), 
a) = -(Cjp- - R‘a‘) / (a'p‘ - p'a‘), 
P'= (f/Pj — fj a j) / (ajpj-p.aj). 


(28) 

(29) 

(30) 


Similarly .eliminating <}>' from (20) and (21) and comparing the result with (19), we obtain 


and 
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Thus, if we know aj, aj, (3|, p{ , y[, and yj (which can be obtained from the lower boundary 
condition), then all of the a’s, P’s and Ys are readily obtained for each j. In the present problem, 
the boundary condition (11) implies that all af = a| = P[ = pj = y[ = yj = 0. 

Equations (18) and (19) can then be used to obtain <j>J and <J>‘ at all j=J-l, J-2....1, provided 
<|>J and <bj are known, which can be obtained from the upper boundary condition. In the present 
problem, 4 >; = 4 >i = 0 . 


e) coding strategy 

The first step of the solution procedure involves the use of a forward FFT to obtain the 
complex amplitudes of the forcing function F that depend only on latitude. Then for each 
wavenumber, the complex amplitude of the solution is obtained by solving the coupled second 
order difference equations. Finally we use a backward FFT to obtain the solution. 

We made this procedure very efficient by using the CRAY subroutine RFFTMLT to 
transform (both forward and backward) all latitudes simultaneously . In addition, we wrote the 
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code to solve two coupled second order difference equations by vectorizing over longitudinal 
wavenumbers. This vectorization coupled with the vectorized FFT package makes our direct 
solver very efficient. 

It should be pointed out that if the longitudinal derivatives are approximated by finite 
differences, then the definitions of k and k 2 should be appropriately changed in all of the 
equations above . Also the FFT has a restriction on I, namely that 1= 2 p x3 q x5 r , where p, q and r 
are integers. However, there is no restriction on the choice of J. 

The complete listing of the FORTRAN code for the solver is given in the Appendix. The 
c allin g sequence for the solver is as follows: 

CALL ELLSOL(IMA, IM, JM, AE, INIT1, INIT2, DEFF, Cl, C2, C3, C4, C5, C6, FI, FO 

, WSV, WRK, IX) 

where 

IMA is the leading dimension of input and output arrays FI and FO. 

IM is the longitudinal domain over which the solution is desired. 

(IM should be less than or equal to IMA) 

JM is the number of grid points from the south pole to the north pole. 

AE is the radius of the sphere. 

INIT1 is a logical variable, true only the first time for a given IM, JM and AE. 

INTT2 is a logical variable, used only when INITl is false. It is true whenever the coefficients 
of the elliptic equation change (except for C6). 

DIFF is a logical variable, true if wavenumber k is based on finite-difference, false otherwise. 

Cl, C2, C3, C4, C5, C6 are the coefficients of equation (1) and have dimension JM. 

Cl, C2, C4, and C6 are defined at the grid points j and C3 and C5 are defined at the 
midpoints between j and j+1 . C6 can change any time. If any other coefficients change, 
then INIT2 should be true. 
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FI is an input array of dimension (IMA,JM) containing the forcing F in the locations 1=1, IM , 
and J=1,JM. 

FO is an output array of dimension (IMAJM) containing the solution 4> in the locations 1=1, IM 
and J=1,JM. 

WSV is an array whose dimension is at least [4*IM+9*JM+7]. This array stores some 

constants to be used in subsequent calls to ELLSOL. It should not be overwritten unless the 
next call to ELLSOL has INTTl true. 

WRK is a work array whose dimension is at least [14 * IM * JM] 

IX is an integer array of dimension IM needed for the FFT (it should not be overwritten) 

III. Evaluation of the solver 

In this section, we present results from some tests to evaluate the solver. Since it is difficult 
to find analytical solutions to (1) against which we can compare the numerical solutions, we adopt 
the following "prescribed solution" procedure. Under this procedure, we assume a solution apriori 
and apply the differential operator on the left hand side of (1) to obtain the forcing F. Then we 
obtain the numerical solution for this forcing and compare the results with the original assumed 
solution. Here we consider the following simple function for <t>: 

<j>i j = 5.0xl0 4 + l.OxlO 3 cos0 j cos(27ti / 1) ^4) 

where i and j are longitudinal and latitudinal indices, respectively. A contour map of this function 
is shown in Fig. 1. The forcing Fy is computed using a second order accurate finite-difference 
operator corresponding to the left hand side of (1). In the following, we consider two cases. In 
case 1 we apply the solver to the elliptic equation of BSHB and compare the results with those 
obtained from the multigrid solver. In case 2 we apply it to the Poisson equation and compare the 
results with those obtained from FISHPAK. 
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The elliptic equation of BSHB is obtained by setting 

Ci(6) = G, c 2 (0) = O.O, c 3 (0) = G 

C4(0)= _^p, c 5 <e)=o.o, c 6 (e)=-[(Ai/2) 2 ir 1 

ao0 

where 

G = [l + F 2 ] -1 , and F = At£2sin0. 


Here 4» =50000 is a mean value, At=3600 s, and Q is the earth's rotation rate. The 2D multigrid 
solver of BSHB is used in the Full-Multigrid mode. We use a single V-cycle with one relaxation 
sweep both before and after the coarse grid correction and eight relaxation steps on the coarsest 
grid. These are identical to those used in BSHB and more details are available in that paper. 

We solved the above problem for various resolutions ranging from (I J) = (48,25) to 
(IJ)=(768,385). Figures 2a and 2b show the numerical solution for the direct method and the 
multigrid method, respectively with (I J)=(96,49). The solutions appear to be almost identical to 
that shown in Fig. 1. However, the accuracy of the solution is revealed in Figs. 3a and 3b which 
show the difference between the exact and the numerical solutions for both solvers. Notice that in 
Fig. 3a the error is multiplied by a factor of 10® while in Fig. 3b, it is multiplied by a factor of 10. 
Thus in this case the direct solver is several orders of magnitude more accurate. Similar results 
were also found at other resolutions (not shown). Furthermore, the accuracy of the multigrid 
solver did not improve significantly when both the number of V-cycles and the number of 
relaxation sweeps were increased. However, we do recognize that the level of accuracy of the 
spatial discretization should be the level of accuracy desired for any problem. Nevertheless, an 
efficient direct solver is always preferable since its solution is close to machine accuracy. 

We next examine the efficiency of the solvers. For this purpose, we present in Fig. 4 the 
CPU time (t) taken for 500 calls to the solver on a single processor of the CRAY YMP as a 
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function of the total number (N) of grid points (i.e. IxJ). In Fig. 4a the axes are linear while in 
Fig. 4b they are logarithmic. The timings arc also presented in Table 1. Clearly the direct solver is 
faster at all resolutions examined. Also, note that for both solvers, the time (t) is almost a linear 
function of the total number of grid points (N). 

Table 1: CPU time (t) in seconds for 500 calls to the direct solver, the multigrid solver and the 

solver from FISHPAK on a single processor of the CRAY YMP as a function of 
resolution. 


(i.n 

direct solver 

(48,25) 

0.513 

(72,46) 

1.110 

(96,49) 

1.401 

(144,91) 

4.062 

(192,97) 

5.253 

(288,181) 

14.464 

(384,193) 

19.563 

(576,361) 

55.889 

(768,385) 

77.998 


multigrid solver FISHPAK 

2.323 3.14 

7.008 12.615 

23.301 53.914 

85.588 241.579 

326.955 


case 2 

Now we consider the Poisson equation which is obtained by setting Cj(0) = c 3 (0) = 1 and 
c 2 (0) = c 4 (0) = c 5 (0) = c 6 (0) = 0. It should be pointed out that when c 6 (0) = 0 we cannot 

determine the constant part and thus there is no unique solution. In addition, if the forcing has a 
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global mean component, then there is no solution to the problem. Therefore, in our solver we 
remove the global mean from the forcing F whenever c 6 (0) = 0 . This is equivalent to the 

perturbation method of Swarztrauber (1974b). 

We used the forcing as in case 1 to compare the numerical solution with the assumed one. 
The errors in the numerical solution (not shown) are of the order 10'^ or less. Similar results are 
found for other resolutions. 

For this case, we compare the efficiency of our direct solver with the direct solver from the 
FISHPAK package (Adams et al., 1980). As before, we show the CPU time (t) taken for 500 
calls to both solvers as a function of the total number (N) of grid points (Fig. 5). Again, our direct 
solver is significantly faster at all resolutions and there is a noticeable divergence between the two 
curves as the resolution increases. 

We also confirmed that our solver works equally well for a general case in which all of the 
coefficients in (1) are nonzero functions of 0. Finally, we repeated each case above by assuming a 
prescribed solution made up of random numbers. We found for all cases that the errors were 
comparable to those discussed above. 

IV. Summary 

An efficient, direct solver for the discrete solution of a class of two-dimensional separable 
elliptic equations on the sphere has been discussed. It is based on a Fourier decomposition in 
longitude and a direct solution of the resulting coupled second order finite-difference equations in 
latitude. These equations are solved following the approach of Lindzen and Kuo (1969) and Chao 
(1979). 

For the elliptic equation of BSHB we find that the direct solver is both more efficient and 
accurate than the multigrid solver at all resolutions. For the special case of Poisson's equation we 
find, at all resolutions, that the direct solver presented here is more efficient than that available in 
FISHPAK (Adams et al.,1980). 
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Thus our solver is both accurate and efficient and general enough that it could be used for . 
any separable elliptic equation on the sphere with coefficients independent of longitude. It can also 
be applied to limited areas on the sphere if cyclic boundary conditions are invoked in longitude and 
if appropriate boundary conditions are used in latitude. It is currently being used in the global 
multi-level primitive equation semi-Lagrangian semi-implicit model of Bates et al. (1992) at the 
Goddard Laboratory for Atmospheres and in the adjoint model development (Li et al., 1991) at the 
Florida State University. While the solver is accurate and flexible, it takes only a few percent of 
the CPU time taken by the multi-level model dynamics. 
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VII. Appendix 


SUBROUTINE ELLSOL(IMA, IM, JM, AE, INIT1, INIT2, DIFF 
*, Cl, C2 , C3 , C4 , C5, C6, FI, FO, WSV, WRK, IX) 

C 

r * ********************************************* *************************** 

c* 

(IMA, JM) is the dimension of the forcing FI (input) and the * 

solution FO (output) . IM is less than or equal to IMA and 
represents the physical domain (in longitude) over which the 
solution is computed. AE is the radius of the sphere. * 

INIT1 , INIT2 and DIFF are logical variables. * 

When INIT1 is true, the value passed for INIT2 is ignored 

Cl, C2,...C6 are arrays of dimension JM containing the coefficients* 
of the elliptic equation. ^ 

WSV, WRK and IX are other miscellaneous work arrays. * 


C* 

C* 

C* 

C* 

C* 

C* 

C* 

C* 

C* 

c* 
c* 
c* 
c* 
c* 
c* 

c* 
c* 
c* 
c* 
c* 
c* 
c* 
c* 

£*****★**★***★*★★*★★★***************************************************** 

c 

DIMENSION FI ( IMA, JM) , FO‘(IMA,JM) 

Cl ( JM) , C2 ( JM) , C3 ( JM) , C4 ( JM) , C5(JM), C6 ( JM) 

*' WSV(l), WRK<1), IX (IM) 


The dimension of WSV should be at least [4*IM + 9*JM + 7] . 

This array stores Constants need fed in subsequent calls to ELLSOL. 
It should not be overwritten unless INIT1 is true for the 
next call to ELLSOL. 

The dimension of WRK should be at least [14 * IM*JM] . 

The dimension of IX should be at least IM and it should not be 
overwritten unless INIT1 is true for the next call to ELLSOL. 


★ 

* 


LOGICAL INIT1 , INIT2 , DIFF 


ITRG = 3 * ( IM+2 ) + 1 

IMB2 = IM / 2 

IJM = (IMB2-1) * JM 

IJMM = IJM*22 + JM 

IJM2 = IJMM + (IM+2) * (JM-2) 


C 


C 


c 


c 


CALL DIRSOL(IMA, IM, JM; IMB2 , ITRG, AE, INIT1, INIT2, DIFF 
Cl, C2 , C3 , C4 , C5, C6, FI, FO, IX 


WSV(l), WSV ( JM+1) , WSV ( JM*2+1 ) , WSV(JM*3+1) 
WSV ( JM*4+1 ) , WSV ( JM*5+1 ) , WSV(JM*6+1), WSV(JM*7+1) 
WSV( JM*9+1) , WSV ( JM*9+IMB2+1) , WSV ( JM*9+IM+1 ) 


WRK ( 1 ) , 

WRK ( IJM*4+1 ) , 
WRK ( I JM*8+ 1 ) , 
WRK ( IJM*12+ 1 ) 
WRK ( IJM*16+1 ) 
WRK { IJM*20+1 ) 
WRK { IJM2+1 ) , 
WRK ( JM*3+1 ) , 


WRK (I JM+1) , 

WRK ( IJM*5+1 ) , 
WRK ( IJM*9+1 ) , 

, WRK ( IJM* 13 + 1 ) , 
, WRK ( IJM* 17 + 1 ) , 
, WRK ( I JM*2 1+1 ) , 
WRK ( 1 ) , 

WRK ( JM*4+1 ) ) 


WRK ( IJM*2+1 ) , 
WRK (IJM* 6+1) , 
WRK ( IJM*10+1 ) , 
WRK ( IJM* 14+1 ) , 
WRK ( IJM* 18+1) , 
WRK ( IJM*22+1 ) , 
WRK (JM+1) , 


WRK ( IJM*3+1 ) 
WRK ( IJM*7+1 ) 
WRK ( IJM* 11+1 ) 
WRK ( IJM*15+1 ) 
WRK ( IJM*19+1 ) 
WRK ( IJMM+1 ) 
WRK ( JM*2+1 ) 


RETURN 

END 
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non o ono n o o on non 


SUBROUTINE DIRSOL(IMA, IM, JM, IMB2, ITRG, AE, INIT1, INIT2 , DIFF 
*, Cl, C2 , C3 , C4 , C5, C6, FI, FO, IX 

*, CDS1 , CDN1 , CDS2 , CDN2 , CDC1, CDC2 , CDC3 , CPH 

*, WV, WVSQ, TR 


* 

AN1 , BN1 

, CN1 , 

DN1 , 

FN1 , 

PN1, 

QN1, 

RN1 

* 

AN2 , BN2 

, CN2 , 

DN2 , 

FN2, 

PN2 , 

QN2 , 

RN2 

★ 

AL1 , AL2 

, GM1 , 

GM2 , 

BT1 , 

BT2 



★ 

t 

FM, A, B 

, AN, 

BN, 

CN, 

DN, 

FN) 


ITRG SHOULD AT 

LEAST BE 3* 

(IM+2) 

+ 1 






INTEGER FORWARD, BACKWARD 
PARAMETER ( FORWARD= - 1 , BACKWARD= 1 ) 

PARAMETER (PI=3 .1415926535898, TWOPI=PI+PI, PIO2=0.5*PI) 


DIMENSION FI {IMA, JM) , FO(IMA,JM) 

*, Cl { JM) , C2 ( JM) , C3 ( JM) , C4 { JM) , C5(JM), C6 { JM) 

*, TR { ITRG) , IX (IM), WVSQ ( IMB2 ) , WV ( IMB2 ) 


DIMENSION AN1 (IMB2-1, JM) , 
AN2 (IMB2-1, JM) , 
QN1 ( IMB2-1, JM) , 
QN2 (IMB2-1, JM) , 
DN1 ( IMB2-1 , JM) , 
FN1 (IMB2-1, JM) , 


BN1 (IMB2-1, JM) , 
BN2 (IMB2-1, JM) , 
PN1 (IMB2-1, JM) , 
PN2 (IMB2-1, JM) , 
DN2 (IMB2-1, JM) 
FN2 ( IMB2-1 , JM) 


CN1 ( IMB2 - 1 , JM ) 
CN2 (IMB2-1, JM) 
RN1 {IMB2-1, JM) 
RN2 { IMB2-1 , JM) 


ALKIMB2-1, JM) , AL2 { IMB2-1 , JM) 
GM1 (IMB2-1, JM) , GM2 (IMB2-1, JM) 
BT1 (IMB2-1, JM) , BT2 (IMB2-1, JM) 


DIMENSION AN (JM) , BN ( JM) , CN(JM), DN(JM), FN(JM) 

, CDNl(JM), CDSl(JM), CDN2 ( JM) , CDS2 { JM) 

, CDCl(JM), CDC2 ( JM) , CDC3 { JM) , FM{JM) , CPH(JM,2) 


DIMENSION A ( IM+2 , JM-2 ) , B(IM*2,JM-2) 
LOGICAL INIT1 , INIT2, DIFF, FIRST, SETCO 


DATA JMM1 , JMM2 , IMP1, IMP2, LEN/5*0/, AREA/ 0.0/ 

DATA FIM, DLM, DPH, RDLM, RDPH, DY, AESQ, CDNP, CDSP/9*0.0/ 
DATA FIRST/. FALSE./ 


SAVE 


INITIALIZATION ON FIRST CALL 


IF ( INIT1 ) THEN 


CALL 

FFTFAX { IM, IX, TR) 

JMM1 

- 

JM - 

1 

JMM2 

= 

JM - 

2 

IMP1 

= 

IM + 

1 

IMP2 

= 

IM + 

2 

LEN 

= 

IMB2 

- 1 

FIM 

= 

1.0 / 

FLOAT (IM) 

DLM 

= 

TWO PI 

* FIM 

DPH 

= 

PI / 

FLOAT (JMM1) 

RDLM 


= 1.0 

/ DLM 

RDPH 


= 1.0 

/ DPH 
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DY ' = AE * DPH 
AESQ = AE * AE 
C 

DO 10 J=1 , JM 

TEM = -PI02 + ( J-l ) *DPH 

CPH ( J, 1 ) = COS (TEM + 0 . 5*DPH) 

CPH ( J, 2 ) = COS (TEM) 

10 CONTINUE 
C 

CPH (1,2) = 0.0 

CPH ( JM, 2 ) =0.0 
FIRST = -TRUE. 

INIT2 = .TRUE. 

ENDIF 

C 

IF ( INIT2 ) THEN 
DO 20 J=2 , JMM1 

TEM1 =1.0 / (CPH ( J, 2 ) * 2.0 * DY) 

TEM2 =1.0 / (CPH ( J, 2 ) * DY*DY) 

C 

CDNl(J) = C5 ( J) * CPH ( J, 1 ) * TEM1 

CDS 1 ( J ) = C5 { J-l ) * CPH (J-l , 1 ) * TEM1 

CDN2 ( J ) = C3 (J) * CPH ( J, 1 ) * TEM2 

CDS2 ( J) = C3(J-lj * CPH (J-l, 1) * TEM2 

C CDC1(J) = C1(J) / (CPH ( J , 2 ) * CPH ( J, 2 ) * AESQ) 

CDC2 ( J) = C4 { J) / (AE * CPH ( J , 2 ) ) 

CDC3 ( J) = C2 ( J) / (AE * CPH ( J , 2 ) * 2.0 * DY) 

C 

20 CONTINUE 

CDCl(l) = 4.0 * C3(l) / (DY*DY) 

CDCl(JM) = 4.0 * C3 ( JMM1 ) / (DY*DY) 

CDNP = C5 ( JMM1 ) * 2.0 / DY 

CDSP = C5(l ) * 2.0 / DY 

C 

IF (DIFF) THEN 
DO 30 1=1, IMB2 

WVSQ(I) = (SIN(0 . 5*I*DLM) * (RDLM * 2.0)) ** 2 
WV (I) = SIN ( I *DLM) * RDLM 
30 CONTINUE 

ELSE 

DO 40 1=1, IMB2 
WVSQ(I) =1*1 
WV (I) = I 
40 CONTINUE 

ENDIF 
ENDIF 
C 

DO 25 J = 1 , JM 

IF (C6 (J) .NE. 0.0) GO TO 26 

25 CONTINUE 
SETC0 = .TRUE. 

CALL REMGLM ( IMA, IM, JM, FI, CPH, FIRST, AREA) 

GO TO 27 

26 SETC0 = .FALSE. 

27 CONTINUE 
C 

DO 42 J=2 , JMM1 

FM( J) = 0.0 

DO 42' 1 = 1, IM 

FM( J) = FM ( J ) + FI ( I , J) 

42 CONTINUE 
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DO 43 J=2 , JMM1 
FM(J) ='FM{J) •* FIM 
43 CONTINUE 

DO 45 J=2 , JMM1 
DO 45 1=1, IM 

A ( I , J-l ) = FI ( I , J) - FM ( J) 
A ( I , J-l ) = FI ( I, J) 

45 CONTINUE 


50 


C 

C 


C 

c 


DO 50 J=1 , JMM2 
A ( IM+1 , J ) = A ( 1 , J) 

A( IM+2 , J) = A(2 , J) 

CONTINUE 

CALL RFFTMLT ( A , B,TR, IX, 1, IMP2 , IM, JMM2 , FORWARD) 


DO 

DO 


60 J=2 , JMM1 
60 1=1, LEN 


ANl (I, J) 


CDS2 ( J ) - 

CDS1 (J) 

BN1 (I, J) 

= 

- (CDS2 ( J ) + 

CDN2 ( J ) + WVSQ ( I ) 

★ 


+ CDS1(J) - 

CDN1 (J) - C6 (J) ) 

CN1 (I, J) 


CDN2 ( J ) + 

CDN1 (J) 

AN2 ( I , J ) 


CDC3 ( J) * 

WV(I) 

BN2 (I, J) 

= 

- CDC2 ( J ) * 

WV(I) 

CN2 (I, J) 

= 

- CDC3.( J) * 

WV(I) 

DN1 ( I , J ) 

= 

A ( 1+ 1+ 1 , J- 

-1) 

QN1 { I , J) 

= 

CN2 ( I , J ) 


PN1 (I, J) 

= 

- BN2 ( I , J ) 


RN1 (I, J) 

= 

AN2 ( I , J ) 


QN2 ( I , J ) 

= 

ANl (I, J) 


PN2 (I, J) 

= 

BN1 ( I , J ) 


RN2 (I, J) 

= 

CN1 ( I , J ) 


DN2 ( I, J) 
60 CONTINUE 

= 

A<I+I+2, J-l) 



CDC1 (J) 


CALL S020DE ( LEN, JM, AN1 , BN1 , CN1 , DN1 , AN2 , BN2 , CN2 , DN2 
*, QN1, PN1 , RN1 , QN2, PN2 , RN2 , FN1 , FN2 

*, AL1 , AL2 , GM1 , GM2 , BT1 , BT2 ) 


70 


DO 70 J=2 , JMM1 
DO 70 1=1, LEN 
A ( 1+ 1 + 1 , J-l ) = FN1 ( I , J) 
A(I+I+2, J-l) = FN2 (I, J) 
CONTINUE 


80 


DO 80 J=2 , JMM1 

AN ( J) = CDS2 ( J ) - CDS1 { J) 

BN ( J) = - (CDS2 ( J) + CDN2 { J) 

k 

CN ( J) = CDN2 ( J) + CDN1(J) 

DN ( J) = A ( IMP1 , J-l ) 

CONTINUE 


+ CDSl(J) - CDNl(J) 

+ WVSQ { IMB2 ) * CDC1(J) 


C6 ( J ) ) 


C 

c 


c 

Cttff ft 
90 


CALL SOIODE ( JM, AN, BN, CN, DN, FN, AL1 , BT1 ) 

DO 90 J=2 , JMM1 
A( IMP1 , J-l ) = FN ( J ) 

BN ( J) = - (AN ( J) + CN ( J) - C6 ( J) ) 

BN ( J 5 = - {CDS2 ( J ) + CDN2 ( J) + CDS1(J) - CDN1(J) 

DN ( J) = A ( 1 , J-l ) 

DN ( J) = FM( J) 

CONTINUE 


C6 ( J ) ) 
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BN ( 1 ) = - <-C6(l) + CDCl(l) - CDSP) 

BN(JM) ^ - { -C6 ( JM) + CDCl(JM) + CDNP) 

AN(JM) = CDC1 ( JM) - CDNP 
CN ( 1 ) = CDCl(l) + CDSP 

DN ( 1 ) = FI (1,1) 

DN(JM) = FI ( 1 , JM) 

C 

CALL S010D2 ( JM, AN, BN, CN, DN, FN, AL1 , BT1) 

C 

IF (SETCO ) THEN 

FLDM = ( FN ( 1 ) + FN ( JM ) ) * ( AREA*CPH ( 1 , 1 ) *0 . 25 ) 
DO 92 J=2 , JMM1 

FLDM = FLDM + FN(J) * (CPH ( J, 2 ) *AREA) 

92 CONTINUE 

DO 94 J = 1 , JM 
FN ( J) = FN ( J) - FLDM 
94 CONTINUE 

END IF 
C 

DO 100 J=2 , JMM1 
C A ( 1 , J-l) = FN ( J ) 

A (2 , J-l) = 0.0 
A ( IMP2 , J-l ) = 0.0 
100 CONTINUE 
C 

CALL RFFTMLT (A, B, TR, IX, 1 , IMP2 , IM, JMM2 , BACKWARD) 

C 

DO 120 J=1 , JMM2 
DO 110 1=1, IM 
C FO ( I , J+l) = A ( I , J) 

FO { I , J+l ) = A ( I , J ) + FN ( J+l ) 

110 CONTINUE 
120 CONTINUE 
C 

DO 130 1=1, IM 
FO (1,1) = FN ( 1 ) 

FO ( I , JM ) = FN ( JM) 

130 CONTINUE 
C 

RETURN 

END 


c 


C 


C 


c 

c 


c 


SUBROUTINE S020DE(LEN, JM, 

QN1, PN1 , 
AL1, AL2 , 


AN1 , BN1 , CN1, DN1 , AN2 , BN2 , CN2 , DN2 
RN1 , QN2, PN2 , RN2, FN1 , FN2 
GM1 , GM2 , BT1 , BT2 ) 


DIMENSION AN1 (LEN, JM) , BN1 ( LEN , JM ) , 
*, AN2 (LEN, JM) , BN2 (LEN, JM) , 
*, QN1 (LEN, JM) , PN1 (LEN, JM) , 
*, QN2 (LEN, JM) , PN2(LEN,JM), 
*, DN1 (LEN, JM) , DN2 (LEN, JM) , 


CNMLEN, JM) 
CN2(LEN, JM) 
RN1(LEN, JM) 
RN2(LEN, JM) 
FN1 (LEN, JM) , 


FN2 (LEN, JM) 


DIMENSION AL1 (LEN, JM) , AL2(LEN,JM), GM1 (LEN, JM) , GM2(LEN,JM) 
*, BT1 (LEN, JM) , BT2 (LEN, JM) 


JMM1 = JM - 1 


DO 10 1=1, LEN 
AL1 (I, 1) = 0.0 
AL2 (1,1) = 0.0 
BT 1(1,1) =0.0 
BT2 ( t , 1 ) = 0.0 
GM1 (1,1) =0.0 
GM2 (1,1) =0.0 
10 CONTINUE 
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c 


c 


c 


c 

c 


c 


c 


c 


c 


c 


DO 30 J=2 , JMM1 
JM1 = J - 1 


DO 20 1=1, LEN 
SA1 = AN1 ( I , J ) 
SA2 = AN1 ( I , J) 


AL1 ( I , JM1 ) + AN2(I,J) 
AL2 ( I , JM1 ) + AN2(I,J) 


GM1 ( I , JM1 ) 
GM2 { I . JM1 ) 


SB1 = QN1 ( I , J) 
SD2 = QNl(I.J) 


AL1 ( I , JM1 ) + QN2(I,J) 
AL2 (I, JM1) + QN2 (I, J) 


GM1 ( I , JM1 ) 
GM2 (I, JM1) 


SD1 = DN1 ( I , J ) - AN1 ( I , J) 
SD2 = DN2 ( I , J ) - QN1 ( I , J) 


BT1 (I, JM1) - AN2 (I, J) * 
BT1(I,JM1) - QN2 ( I , J ) * 


RM = 1.0 / (SA1*SB2 - SB1*SA2 ) 


AL1 

( I , 

, J) 

= - RM 

AL2 

(I, 

, J) 

= - RM 

BT1 

(I, 

, J) 

= RM 

GM1 

(I, 

r J) 

= RM 

GM2 

(I, 

r J) 

= RM 

BT2 

(I, 

, J) 

= - RM 


* (CN1 ( I , J) * SB2 

* (CN2 ( I , J) * SB2 

* (SD1 * SB2 

* ( CN1 ( I , J ) * SB1 

* (CN2 ( I , J ) * SB1 

* (SD1 * SB1 


- RN1 ( I , J } 

* 

SA2 ) 

- RN2 ( I , J) 

★ 

SA2 ) 

- SD2 

* 

SA2 ) 

- RN1 { I , J ) 

* 

SA1 ) 

- RN2 ( I , J) 

* 

SA1) 

- SD2 

★ 

SA1) 


20 CONTINUE 
30 CONTINUE 


DO 40 1=1, LEN 
FN1 ( I , JM) =0.0 
FN2 ( I , JM ) =0.0 
40 CONTINUE 


50 

60 


DO 60 J=JMM1 , 1, -1 
DO 50 1=1, LEN 

FN1 ( I , J) = AL1 ( I , J ) *FN1 ( I , J+l ) 
FN2 ( I , J) = GM1 (I, J) *FN1 (I, J+l) 
CONTINUE 
CONTINUE 


+ AL2 ( I , J ) * FN2 ( I , J+ 1 ) 
+ GM2 ( I , J ) *FN2 ( I , J+ 1 ) 


RETURN 

END 


C 

c 

c 

c 


c 


c 


c 


c 


SUBROUTINE S010DE(JM, AN, BN, CN, DN, FN, AL, BT) 

DIMENSION AN (JM) , BN ( JM) , CN(JM), DN ( JM) , FN ( JM) 

DIMENSION AL(JM) , BT(JM) 

JMM1 = JM - 1 

AL ( 1 ) = 0.0 

BT { 1 ) = 0.0 

FN ( JM) =0.0 

DO 10 J =2 , JMM1 
JM1 = J - 1 

SA1 = 1.0 / ( AN ( J ) * AL(JMl) + BN(J)) 

AL ( J ) = - CN ( J) * SA1 

BT ( J) = ( DN ( J ) - AN ( J) * BT(JMl)) * SA1 

10 CONTINUE 

DO 20 J=JMM1 , 1, -1 

FN ( J ) = AL ( J ) *FN ( J+l ) + BT ( J ) 

20 CONTINUE 

RETURN 

END 
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+ BN1 { I , J ) 
+ BN2 ( I , J ) 

+ PN1 ( I, J) 
+ PN2 ( I , J ) 

BT2 (I, JM1) 
BT2 (I, JM1) 


+ BT1 ( I , J ) 
+ BT2 ( I , J ) 



SUBROUTINE S010D2 ( JM ( AN, BN, CN, DN, FN, AL, BT) 

DIMENSION AN (JM) , BN(JM), CN(JM), DN(JM), FN(JM) 

C 

DIMENSION AL ( JM) , BT ( JM) 

C 

JMM1 = JM - 1 
C 

TEM = 1.0 / BN ( 1 ) 

AL ( 1 ) = - CN ( 1 ) * TEM 

BT ( 1 ) = DN ( 1 ) * TEM 

C 

DO 10 J=2 , JMM1 
JM1 = J - 1 

SA1 = 1.0 / (AN ( J) * AL(JMl) + BN(J)) 

AL{ J) = - CN ( J) * SA1 

BT ( J) = ( DN ( J ) - AN ( J) * BT(JMl)) * SA1 
10 CONTINUE 

C FN ( JM) = (DN(JM) - AN(JM) *BT(JMM1) ) / (BN(JM) + AN ( JM) *AL ( JMM1 ) ) 

C 

DO 20 J=JMM1 ,1,-1 

FN ( J) = AL ( J) *FN ( J+l ) + BT ( J) 

20 CONTINUE 
C 

RETURN 

END 

SUBROUTINE REMGLM ( IMA, IM, JM, FLD, CPH, FIRST, AREA) 

C 

DIMENSION FLD ( IMA, JM) , CPH(JM,2) 

LOGICAL FIRST 

DATA JMM1/0/ , FIM/ 0.0/ 

SAVE 

C 

IF (FIRST) THEN 
JMM1 = JM - ) 

FIM = 1.0 / FLOAT (IM) 

AREA =0.0 

DO 10 J=2 , JMM1 

AREA = AREA + CPH(J,2) 

10 CONTINUE 

AREA = 1.0 / (AREA + CPH(1,1) * 0.5) 

FIRST = .FALSE. 

END IF 

FLDM = ( FLD (1,1) + FLD(1,JM)) * ( AREA*CPH { 1 , 1 ) *0 . 25 ) 

DO 20 J =2 , JMM1 
DO 20 1=1, IM 

FLDM = FLDM + FLD(I,J) * (CPH ( J , 2 ) *AREA*FIM) 

20 CONTINUE 
C 

DO 30 1=1, IMA* JM 
FLD (I, 1) = FLD (I, 1) - FLDM 
30 CONTINUE 
C 

RETURN 

END 
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VIII. List of Figures 

Figure 1. Contour map of the prescribed solution (Eq. 34) to the elliptic equation with (IJ) = 
(96,49). The contour interval is 300. 

Figure 2. Numerical solution in case 1 for (a) the direct solver and (b) the multigrid solver with 
(I J) = (96,49). The contour interval is 300. 

Figure 3. As in Fig. 2 except for the difference between the numerical and prescribed solutions. 

The contour interval is 1. In (a) the difference is multiplied by a factor of 10** and in 
(b) the difference is multiplied by a factor of 10. 

Figure 4. CPU time (t) taken for 500 calls to the direct and to the multigrid solvers on a single 
processor of the CRAY YMP as a function of the total number (N) of grid points. In 
(a) the axes are linear while in (b) the axes are logarithmic. 

Figure 5. As in Fig. 4b except for the present and the FISHPAK direct (subroutine HWSSSP) 
solvers. 
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